TOURIST SPENDING PREDICTION MODEL¶

CONTEXT: The Tanzanian tourism sector plays a significant role in the Tanzanian economy, contributing about 17% to the country’s GDP and 25% of all foreign exchange revenues. The sector, which provides direct employment for more than 600,000 people and up to 2 million people indirectly, generated approximately $2.4 billion in 2018 according to government statistics. Tanzania received a record 1.1 million international visitor arrivals in 2014, mostly from Europe, the US and Africa. Tanzania is the only country in the world which has allocated more than 25% of its total area for wildlife, national parks, and protected areas.There are 16 national parks in Tanzania, 28 game reserves, 44 game-controlled areas, two marine parks and one conservation area.

AIM: The aim of this project is to explore and build a linear regression model that will predict the spending behaivior of tourists visiting Tanzania.The model can be used by different tour operators and the Tanzania Tourism Board to automatically help tourists across the world estimate their expenditure before visiting Tanzania.

In [1]:
#IMPORTING NECESSARY LIBRARIES
import pandas as pd
import numpy as np
In [2]:
import seaborn as sns
import matplotlib.pyplot as plt
In [3]:
%matplotlib inline
In [4]:
#Importing dataset for the analysis.
Tz = pd.read_csv("Train .csv")
Tz
Out[4]:
ID country age_group travel_with total_female total_male purpose main_activity info_source tour_arrangement ... package_transport_tz package_sightseeing package_guided_tour package_insurance night_mainland night_zanzibar payment_mode first_trip_tz most_impressing total_cost
0 tour_0 SWIZERLAND 45-64 Friends/Relatives 1.0 1.0 Leisure and Holidays Wildlife tourism Friends, relatives Independent ... No No No No 13.0 0.0 Cash No Friendly People 674602.5
1 tour_10 UNITED KINGDOM 25-44 NaN 1.0 0.0 Leisure and Holidays Cultural tourism others Independent ... No No No No 14.0 7.0 Cash Yes Wonderful Country, Landscape, Nature 3214906.5
2 tour_1000 UNITED KINGDOM 25-44 Alone 0.0 1.0 Visiting Friends and Relatives Cultural tourism Friends, relatives Independent ... No No No No 1.0 31.0 Cash No Excellent Experience 3315000.0
3 tour_1002 UNITED KINGDOM 25-44 Spouse 1.0 1.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Package Tour ... Yes Yes Yes No 11.0 0.0 Cash Yes Friendly People 7790250.0
4 tour_1004 CHINA 1-24 NaN 1.0 0.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Independent ... No No No No 7.0 4.0 Cash Yes No comments 1657500.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4804 tour_993 UAE 45-64 Alone 0.0 1.0 Business Hunting tourism Friends, relatives Independent ... No No No No 2.0 0.0 Credit Card No No comments 3315000.0
4805 tour_994 UNITED STATES OF AMERICA 25-44 Spouse 1.0 1.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Package Tour ... Yes Yes Yes Yes 11.0 0.0 Cash Yes Friendly People 10690875.0
4806 tour_995 NETHERLANDS 1-24 NaN 1.0 0.0 Leisure and Holidays Wildlife tourism others Independent ... No No No No 3.0 7.0 Cash Yes Good service 2246636.7
4807 tour_997 SOUTH AFRICA 25-44 Friends/Relatives 1.0 1.0 Business Beach tourism Travel, agent, tour operator Independent ... No No No No 5.0 0.0 Credit Card No Friendly People 1160250.0
4808 tour_999 UNITED KINGDOM 25-44 Spouse 1.0 1.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Package Tour ... Yes Yes Yes No 4.0 7.0 Cash Yes Friendly People 13260000.0

4809 rows × 23 columns

In [5]:
#Showing the top most records
Tz.head(20)
Out[5]:
ID country age_group travel_with total_female total_male purpose main_activity info_source tour_arrangement ... package_transport_tz package_sightseeing package_guided_tour package_insurance night_mainland night_zanzibar payment_mode first_trip_tz most_impressing total_cost
0 tour_0 SWIZERLAND 45-64 Friends/Relatives 1.0 1.0 Leisure and Holidays Wildlife tourism Friends, relatives Independent ... No No No No 13.0 0.0 Cash No Friendly People 674602.5
1 tour_10 UNITED KINGDOM 25-44 NaN 1.0 0.0 Leisure and Holidays Cultural tourism others Independent ... No No No No 14.0 7.0 Cash Yes Wonderful Country, Landscape, Nature 3214906.5
2 tour_1000 UNITED KINGDOM 25-44 Alone 0.0 1.0 Visiting Friends and Relatives Cultural tourism Friends, relatives Independent ... No No No No 1.0 31.0 Cash No Excellent Experience 3315000.0
3 tour_1002 UNITED KINGDOM 25-44 Spouse 1.0 1.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Package Tour ... Yes Yes Yes No 11.0 0.0 Cash Yes Friendly People 7790250.0
4 tour_1004 CHINA 1-24 NaN 1.0 0.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Independent ... No No No No 7.0 4.0 Cash Yes No comments 1657500.0
5 tour_1005 UNITED KINGDOM 25-44 NaN 0.0 1.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Package Tour ... No Yes Yes No 9.0 3.0 Cash Yes Wildlife 120950.0
6 tour_1007 SOUTH AFRICA 45-64 Alone 0.0 1.0 Business Mountain climbing Friends, relatives Independent ... No No No No 9.0 0.0 Cash Yes Friendly People 466140.0
7 tour_1008 UNITED STATES OF AMERICA 45-64 Friends/Relatives 1.0 1.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Package Tour ... Yes Yes Yes Yes 10.0 3.0 Cash Yes Friendly People 3480750.0
8 tour_101 NIGERIA 25-44 Alone 0.0 1.0 Leisure and Holidays Cultural tourism Travel, agent, tour operator Independent ... No No No No 4.0 0.0 Cash Yes NaN 994500.0
9 tour_1011 INDIA 25-44 Alone 1.0 0.0 Business Wildlife tourism Travel, agent, tour operator Independent ... No No No No 5.0 0.0 Credit Card Yes Friendly People 2486250.0
10 tour_1012 BRAZIL 25-44 Spouse 1.0 1.0 Leisure and Holidays Wildlife tourism Radio, TV, Web Independent ... No No No No 17.0 3.0 Cash Yes Wonderful Country, Landscape, Nature 1117155.0
11 tour_1013 CANADA 45-64 Children 2.0 0.0 Leisure and Holidays Beach tourism Friends, relatives Independent ... No No No No 30.0 0.0 Cash No Excellent Experience 8121750.0
12 tour_1016 CANADA 45-64 Children 0.0 2.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Independent ... No No No No 11.0 3.0 Cash Yes No comments 331500.0
13 tour_1017 MALT 25-44 Friends/Relatives 2.0 0.0 Leisure and Holidays Wildlife tourism Friends, relatives Package Tour ... Yes No No No 10.0 0.0 Cash Yes No comments 11346650.0
14 tour_1018 MOZAMBIQUE 25-44 Alone 0.0 1.0 Visiting Friends and Relatives Beach tourism Friends, relatives Independent ... No No No No 2.0 0.0 Cash Yes Wildlife 497250.0
15 tour_102 RWANDA 65+ Alone 1.0 0.0 Leisure and Holidays Beach tourism Friends, relatives Independent ... No No No No 0.0 2.0 Cash Yes Wonderful Country, Landscape, Nature 331500.0
16 tour_1021 AUSTRIA 45-64 Friends/Relatives 4.0 1.0 Visiting Friends and Relatives Mountain climbing Friends, relatives Independent ... No No No No 24.0 0.0 Cash No Friendly People 2000000.0
17 tour_1022 MYANMAR 25-44 NaN 1.0 0.0 Meetings and Conference Wildlife tourism Radio, TV, Web Independent ... No No No No 5.0 0.0 Cash Yes Friendly People 331500.0
18 tour_1024 GERMANY 25-44 Children 1.0 1.0 Visiting Friends and Relatives Cultural tourism Friends, relatives Independent ... No No No No 3.0 0.0 Cash Yes Friendly People 2269330.0
19 tour_1026 KENYA 25-44 NaN 1.0 0.0 Business Mountain climbing Friends, relatives Independent ... No No No No 4.0 0.0 Cash No Friendly People 377520.0

20 rows × 23 columns

In [6]:
#Information about the tabular data.
Tz.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4809 entries, 0 to 4808
Data columns (total 23 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   ID                     4809 non-null   object 
 1   country                4809 non-null   object 
 2   age_group              4809 non-null   object 
 3   travel_with            3695 non-null   object 
 4   total_female           4806 non-null   float64
 5   total_male             4804 non-null   float64
 6   purpose                4809 non-null   object 
 7   main_activity          4809 non-null   object 
 8   info_source            4809 non-null   object 
 9   tour_arrangement       4809 non-null   object 
 10  package_transport_int  4809 non-null   object 
 11  package_accomodation   4809 non-null   object 
 12  package_food           4809 non-null   object 
 13  package_transport_tz   4809 non-null   object 
 14  package_sightseeing    4809 non-null   object 
 15  package_guided_tour    4809 non-null   object 
 16  package_insurance      4809 non-null   object 
 17  night_mainland         4809 non-null   float64
 18  night_zanzibar         4809 non-null   float64
 19  payment_mode           4809 non-null   object 
 20  first_trip_tz          4809 non-null   object 
 21  most_impressing        4496 non-null   object 
 22  total_cost             4809 non-null   float64
dtypes: float64(5), object(18)
memory usage: 864.2+ KB
In [7]:
#Statistical description of numerical features.
Tz.describe()
Out[7]:
total_female total_male night_mainland night_zanzibar total_cost
count 4806.000000 4804.000000 4809.000000 4809.000000 4.809000e+03
mean 0.926758 1.009575 8.488043 2.304429 8.114389e+06
std 1.288242 1.138865 10.427624 4.227080 1.222490e+07
min 0.000000 0.000000 0.000000 0.000000 4.900000e+04
25% 0.000000 1.000000 3.000000 0.000000 8.121750e+05
50% 1.000000 1.000000 6.000000 0.000000 3.397875e+06
75% 1.000000 1.000000 11.000000 4.000000 9.945000e+06
max 49.000000 44.000000 145.000000 61.000000 9.953288e+07

FINDING AND REPLACING MISSING/NULL VALUES.¶

In [8]:
Tz.isnull().sum()
Out[8]:
ID                          0
country                     0
age_group                   0
travel_with              1114
total_female                3
total_male                  5
purpose                     0
main_activity               0
info_source                 0
tour_arrangement            0
package_transport_int       0
package_accomodation        0
package_food                0
package_transport_tz        0
package_sightseeing         0
package_guided_tour         0
package_insurance           0
night_mainland              0
night_zanzibar              0
payment_mode                0
first_trip_tz               0
most_impressing           313
total_cost                  0
dtype: int64

travel_with, total-female, total male and most_impressing columns have NaN/Null values which needs to be filled.

In [9]:
#Travel_with
Tz['travel_with'].fillna('Alone',inplace=True)
Tz
Out[9]:
ID country age_group travel_with total_female total_male purpose main_activity info_source tour_arrangement ... package_transport_tz package_sightseeing package_guided_tour package_insurance night_mainland night_zanzibar payment_mode first_trip_tz most_impressing total_cost
0 tour_0 SWIZERLAND 45-64 Friends/Relatives 1.0 1.0 Leisure and Holidays Wildlife tourism Friends, relatives Independent ... No No No No 13.0 0.0 Cash No Friendly People 674602.5
1 tour_10 UNITED KINGDOM 25-44 Alone 1.0 0.0 Leisure and Holidays Cultural tourism others Independent ... No No No No 14.0 7.0 Cash Yes Wonderful Country, Landscape, Nature 3214906.5
2 tour_1000 UNITED KINGDOM 25-44 Alone 0.0 1.0 Visiting Friends and Relatives Cultural tourism Friends, relatives Independent ... No No No No 1.0 31.0 Cash No Excellent Experience 3315000.0
3 tour_1002 UNITED KINGDOM 25-44 Spouse 1.0 1.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Package Tour ... Yes Yes Yes No 11.0 0.0 Cash Yes Friendly People 7790250.0
4 tour_1004 CHINA 1-24 Alone 1.0 0.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Independent ... No No No No 7.0 4.0 Cash Yes No comments 1657500.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4804 tour_993 UAE 45-64 Alone 0.0 1.0 Business Hunting tourism Friends, relatives Independent ... No No No No 2.0 0.0 Credit Card No No comments 3315000.0
4805 tour_994 UNITED STATES OF AMERICA 25-44 Spouse 1.0 1.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Package Tour ... Yes Yes Yes Yes 11.0 0.0 Cash Yes Friendly People 10690875.0
4806 tour_995 NETHERLANDS 1-24 Alone 1.0 0.0 Leisure and Holidays Wildlife tourism others Independent ... No No No No 3.0 7.0 Cash Yes Good service 2246636.7
4807 tour_997 SOUTH AFRICA 25-44 Friends/Relatives 1.0 1.0 Business Beach tourism Travel, agent, tour operator Independent ... No No No No 5.0 0.0 Credit Card No Friendly People 1160250.0
4808 tour_999 UNITED KINGDOM 25-44 Spouse 1.0 1.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Package Tour ... Yes Yes Yes No 4.0 7.0 Cash Yes Friendly People 13260000.0

4809 rows × 23 columns

In [10]:
#Most_impressing
Tz['most_impressing'].fillna('Friendly People',inplace=True)
Tz
Out[10]:
ID country age_group travel_with total_female total_male purpose main_activity info_source tour_arrangement ... package_transport_tz package_sightseeing package_guided_tour package_insurance night_mainland night_zanzibar payment_mode first_trip_tz most_impressing total_cost
0 tour_0 SWIZERLAND 45-64 Friends/Relatives 1.0 1.0 Leisure and Holidays Wildlife tourism Friends, relatives Independent ... No No No No 13.0 0.0 Cash No Friendly People 674602.5
1 tour_10 UNITED KINGDOM 25-44 Alone 1.0 0.0 Leisure and Holidays Cultural tourism others Independent ... No No No No 14.0 7.0 Cash Yes Wonderful Country, Landscape, Nature 3214906.5
2 tour_1000 UNITED KINGDOM 25-44 Alone 0.0 1.0 Visiting Friends and Relatives Cultural tourism Friends, relatives Independent ... No No No No 1.0 31.0 Cash No Excellent Experience 3315000.0
3 tour_1002 UNITED KINGDOM 25-44 Spouse 1.0 1.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Package Tour ... Yes Yes Yes No 11.0 0.0 Cash Yes Friendly People 7790250.0
4 tour_1004 CHINA 1-24 Alone 1.0 0.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Independent ... No No No No 7.0 4.0 Cash Yes No comments 1657500.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4804 tour_993 UAE 45-64 Alone 0.0 1.0 Business Hunting tourism Friends, relatives Independent ... No No No No 2.0 0.0 Credit Card No No comments 3315000.0
4805 tour_994 UNITED STATES OF AMERICA 25-44 Spouse 1.0 1.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Package Tour ... Yes Yes Yes Yes 11.0 0.0 Cash Yes Friendly People 10690875.0
4806 tour_995 NETHERLANDS 1-24 Alone 1.0 0.0 Leisure and Holidays Wildlife tourism others Independent ... No No No No 3.0 7.0 Cash Yes Good service 2246636.7
4807 tour_997 SOUTH AFRICA 25-44 Friends/Relatives 1.0 1.0 Business Beach tourism Travel, agent, tour operator Independent ... No No No No 5.0 0.0 Credit Card No Friendly People 1160250.0
4808 tour_999 UNITED KINGDOM 25-44 Spouse 1.0 1.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Package Tour ... Yes Yes Yes No 4.0 7.0 Cash Yes Friendly People 13260000.0

4809 rows × 23 columns

In [11]:
#total_male
Tz['total_male'].fillna(method='bfill',inplace=True)
Tz
Out[11]:
ID country age_group travel_with total_female total_male purpose main_activity info_source tour_arrangement ... package_transport_tz package_sightseeing package_guided_tour package_insurance night_mainland night_zanzibar payment_mode first_trip_tz most_impressing total_cost
0 tour_0 SWIZERLAND 45-64 Friends/Relatives 1.0 1.0 Leisure and Holidays Wildlife tourism Friends, relatives Independent ... No No No No 13.0 0.0 Cash No Friendly People 674602.5
1 tour_10 UNITED KINGDOM 25-44 Alone 1.0 0.0 Leisure and Holidays Cultural tourism others Independent ... No No No No 14.0 7.0 Cash Yes Wonderful Country, Landscape, Nature 3214906.5
2 tour_1000 UNITED KINGDOM 25-44 Alone 0.0 1.0 Visiting Friends and Relatives Cultural tourism Friends, relatives Independent ... No No No No 1.0 31.0 Cash No Excellent Experience 3315000.0
3 tour_1002 UNITED KINGDOM 25-44 Spouse 1.0 1.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Package Tour ... Yes Yes Yes No 11.0 0.0 Cash Yes Friendly People 7790250.0
4 tour_1004 CHINA 1-24 Alone 1.0 0.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Independent ... No No No No 7.0 4.0 Cash Yes No comments 1657500.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4804 tour_993 UAE 45-64 Alone 0.0 1.0 Business Hunting tourism Friends, relatives Independent ... No No No No 2.0 0.0 Credit Card No No comments 3315000.0
4805 tour_994 UNITED STATES OF AMERICA 25-44 Spouse 1.0 1.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Package Tour ... Yes Yes Yes Yes 11.0 0.0 Cash Yes Friendly People 10690875.0
4806 tour_995 NETHERLANDS 1-24 Alone 1.0 0.0 Leisure and Holidays Wildlife tourism others Independent ... No No No No 3.0 7.0 Cash Yes Good service 2246636.7
4807 tour_997 SOUTH AFRICA 25-44 Friends/Relatives 1.0 1.0 Business Beach tourism Travel, agent, tour operator Independent ... No No No No 5.0 0.0 Credit Card No Friendly People 1160250.0
4808 tour_999 UNITED KINGDOM 25-44 Spouse 1.0 1.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Package Tour ... Yes Yes Yes No 4.0 7.0 Cash Yes Friendly People 13260000.0

4809 rows × 23 columns

In [12]:
#total_female
Tz['total_female'].fillna(method='bfill',inplace=True)
Tz
Out[12]:
ID country age_group travel_with total_female total_male purpose main_activity info_source tour_arrangement ... package_transport_tz package_sightseeing package_guided_tour package_insurance night_mainland night_zanzibar payment_mode first_trip_tz most_impressing total_cost
0 tour_0 SWIZERLAND 45-64 Friends/Relatives 1.0 1.0 Leisure and Holidays Wildlife tourism Friends, relatives Independent ... No No No No 13.0 0.0 Cash No Friendly People 674602.5
1 tour_10 UNITED KINGDOM 25-44 Alone 1.0 0.0 Leisure and Holidays Cultural tourism others Independent ... No No No No 14.0 7.0 Cash Yes Wonderful Country, Landscape, Nature 3214906.5
2 tour_1000 UNITED KINGDOM 25-44 Alone 0.0 1.0 Visiting Friends and Relatives Cultural tourism Friends, relatives Independent ... No No No No 1.0 31.0 Cash No Excellent Experience 3315000.0
3 tour_1002 UNITED KINGDOM 25-44 Spouse 1.0 1.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Package Tour ... Yes Yes Yes No 11.0 0.0 Cash Yes Friendly People 7790250.0
4 tour_1004 CHINA 1-24 Alone 1.0 0.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Independent ... No No No No 7.0 4.0 Cash Yes No comments 1657500.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4804 tour_993 UAE 45-64 Alone 0.0 1.0 Business Hunting tourism Friends, relatives Independent ... No No No No 2.0 0.0 Credit Card No No comments 3315000.0
4805 tour_994 UNITED STATES OF AMERICA 25-44 Spouse 1.0 1.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Package Tour ... Yes Yes Yes Yes 11.0 0.0 Cash Yes Friendly People 10690875.0
4806 tour_995 NETHERLANDS 1-24 Alone 1.0 0.0 Leisure and Holidays Wildlife tourism others Independent ... No No No No 3.0 7.0 Cash Yes Good service 2246636.7
4807 tour_997 SOUTH AFRICA 25-44 Friends/Relatives 1.0 1.0 Business Beach tourism Travel, agent, tour operator Independent ... No No No No 5.0 0.0 Credit Card No Friendly People 1160250.0
4808 tour_999 UNITED KINGDOM 25-44 Spouse 1.0 1.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Package Tour ... Yes Yes Yes No 4.0 7.0 Cash Yes Friendly People 13260000.0

4809 rows × 23 columns

In [13]:
#Confirming if there still exist a null values.
Tz.isnull().sum()
Out[13]:
ID                       0
country                  0
age_group                0
travel_with              0
total_female             0
total_male               0
purpose                  0
main_activity            0
info_source              0
tour_arrangement         0
package_transport_int    0
package_accomodation     0
package_food             0
package_transport_tz     0
package_sightseeing      0
package_guided_tour      0
package_insurance        0
night_mainland           0
night_zanzibar           0
payment_mode             0
first_trip_tz            0
most_impressing          0
total_cost               0
dtype: int64
In [14]:
sns.pairplot(Tz)
Out[14]:
<seaborn.axisgrid.PairGrid at 0x2430913bf70>

Using the pairplotting above inorder to form some simple classification models by drawing some simple lines or make linear separation in our data-set to form a linear regression model.

In [15]:
Tz.plot.scatter(x='total_cost',y='night_mainland')
Out[15]:
<AxesSubplot:xlabel='total_cost', ylabel='night_mainland'>
In [16]:
sns.lmplot(x='total_cost',y='night_mainland',data=Tz)
Out[16]:
<seaborn.axisgrid.FacetGrid at 0x2430c0b7430>
In [17]:
#Chart showing the linear distribution of tourist spending.

import warnings
warnings.filterwarnings('ignore')

sns.distplot(Tz['total_cost']/10**6).set(title='SPENDING DISTIBUTION')
Out[17]:
[Text(0.5, 1.0, 'SPENDING DISTIBUTION')]
In [18]:
#Data correlation of features.

Tz.corr()
Out[18]:
total_female total_male night_mainland night_zanzibar total_cost
total_female 1.000000 0.467000 0.031233 0.138523 0.285862
total_male 0.467000 1.000000 -0.041369 0.050172 0.183785
night_mainland 0.031233 -0.041369 1.000000 -0.118155 0.020473
night_zanzibar 0.138523 0.050172 -0.118155 1.000000 0.145139
total_cost 0.285862 0.183785 0.020473 0.145139 1.000000
In [19]:
sns.heatmap(Tz.corr(), annot=True)
Out[19]:
<AxesSubplot:>

Generating a summary EDA checks on the dataset¶

In [20]:
#import pandas profiling library 

from pandas_profiling import ProfileReport
profile = ProfileReport(Tz)
profile
Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]
Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]
Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]
Out[20]:

EXPLORATORY DATA ANALYSIS¶

In [21]:
#top 5 countries with the highest spending statistics

countries = Tz[['country', 'total_cost']].head(8).groupby('country').sum()
countries
Out[21]:
total_cost
country
CHINA 1657500.0
SOUTH AFRICA 466140.0
SWIZERLAND 674602.5
UNITED KINGDOM 14441106.5
UNITED STATES OF AMERICA 3480750.0
In [22]:
#CHAT SHOWING TOP 5 COUNTRIES WITH HIGHEST SPENDING STATISTICS

plt.style.use('seaborn')
countries.plot(figsize=(12,3), color='red', legend=False, title='TOP 5 COUNTRIES WITH HIGHEST SPENDING STATISTICS')
Out[22]:
<AxesSubplot:title={'center':'TOP 5 COUNTRIES WITH HIGHEST SPENDING STATISTICS'}, xlabel='country'>

Of all the five countries shown above, United Kingdom indicates the leading top most country in tourist spending.

In [23]:
#COUNTRIES WITH HIGHEST TOURIST CITIZENS VISITING TANZANIA.

Tz[['country']].value_counts()
Out[23]:
country                 
UNITED STATES OF AMERICA    695
UNITED KINGDOM              533
ITALY                       393
FRANCE                      280
ZIMBABWE                    274
                           ... 
ANGOLA                        1
MONTENEGRO                    1
MORROCO                       1
MYANMAR                       1
MADAGASCAR                    1
Length: 105, dtype: int64
In [24]:
Tz[['country']].value_counts().head(20).plot(kind='barh', title='COUNTRIES WITH HIGHEST TOURIST CITIZENS VISITING TANZANIA')
Out[24]:
<AxesSubplot:title={'center':'COUNTRIES WITH HIGHEST TOURIST CITIZENS VISITING TANZANIA'}, ylabel='country'>
In [25]:
#which age-group are the highest spenders and who are the over all highest spenders by travel with
In [26]:
Age = Tz[['age_group','total_cost','travel_with']]
Age
Out[26]:
age_group total_cost travel_with
0 45-64 674602.5 Friends/Relatives
1 25-44 3214906.5 Alone
2 25-44 3315000.0 Alone
3 25-44 7790250.0 Spouse
4 1-24 1657500.0 Alone
... ... ... ...
4804 45-64 3315000.0 Alone
4805 25-44 10690875.0 Spouse
4806 1-24 2246636.7 Alone
4807 25-44 1160250.0 Friends/Relatives
4808 25-44 13260000.0 Spouse

4809 rows × 3 columns

In [27]:
Age.describe(include = 'all')
Out[27]:
age_group total_cost travel_with
count 4809 4.809000e+03 4809
unique 4 NaN 5
top 25-44 NaN Alone
freq 2487 NaN 2379
mean NaN 8.114389e+06 NaN
std NaN 1.222490e+07 NaN
min NaN 4.900000e+04 NaN
25% NaN 8.121750e+05 NaN
50% NaN 3.397875e+06 NaN
75% NaN 9.945000e+06 NaN
max NaN 9.953288e+07 NaN
In [28]:
Age.groupby('age_group').sum()/10**6
Out[28]:
total_cost
age_group
1-24 3379.088150
25-44 14987.099938
45-64 15371.839260
65+ 5284.068284
In [29]:
#HIGHEST SPENDING AGE GROUP
Age.groupby('age_group').sum().plot(kind='bar',title='HIGHEST SPENDING AGE GROUP')
Out[29]:
<AxesSubplot:title={'center':'HIGHEST SPENDING AGE GROUP'}, xlabel='age_group'>

Age group 45-64 has the highest spending statistics.

In [30]:
Age.groupby('travel_with').sum()
Out[30]:
total_cost
travel_with
Alone 8.717835e+09
Children 1.653502e+09
Friends/Relatives 9.158700e+09
Spouse 1.274631e+10
Spouse and Children 6.745753e+09
In [31]:
#Highest age_group Tourist.

Age[['age_group']].value_counts()
Out[31]:
age_group
25-44        2487
45-64        1391
1-24          624
65+           307
dtype: int64
In [32]:
Age[['age_group']].value_counts().plot(kind='pie', autopct='%1.1f%%')
Out[32]:
<AxesSubplot:ylabel='None'>

A pie chart show age_group representation in tourism activities in Tanzania

In [33]:
#COUNTRY WITH THE MOST SPENDING TOURIST
In [34]:
Tz[['country','total_cost']].head(10)
Out[34]:
country total_cost
0 SWIZERLAND 674602.5
1 UNITED KINGDOM 3214906.5
2 UNITED KINGDOM 3315000.0
3 UNITED KINGDOM 7790250.0
4 CHINA 1657500.0
5 UNITED KINGDOM 120950.0
6 SOUTH AFRICA 466140.0
7 UNITED STATES OF AMERICA 3480750.0
8 NIGERIA 994500.0
9 INDIA 2486250.0
In [35]:
Tz[['country','total_cost']].head(10).plot(kind='bar',title='COUNTRY WITH THE MOST SPENDING TOURIST')
Out[35]:
<AxesSubplot:title={'center':'COUNTRY WITH THE MOST SPENDING TOURIST'}>

United Kindom (3) is the country with the most spending tourist even though it falls second to USA with the highest number of tourist visiting Tanzania.

In [36]:
#AVERAGE NUMBER OF NIGHT TOURIST'S SPENDS ON TANZANIA MAINLAND
In [37]:
Tz[['night_mainland']].mean()
Out[37]:
night_mainland    8.488043
dtype: float64

An average eight (8) number of nights is spend by a tourist on Tanzania mainland

In [38]:
#AVERAGE NUMBER OF NIGHT TOURIST'S SPENDS ON TANZANIA ZANZIBAR
In [39]:
Tz[['night_zanzibar']].mean()
Out[39]:
night_zanzibar    2.304429
dtype: float64

Two(2) nights averagely is spent on Tanzania Zanzibar by tourist.

In [40]:
#MOST PREFERRED PAYMENT METHOD BY TOURIST
In [41]:
Tz[['payment_mode']].head(30)
Out[41]:
payment_mode
0 Cash
1 Cash
2 Cash
3 Cash
4 Cash
5 Cash
6 Cash
7 Cash
8 Cash
9 Credit Card
10 Cash
11 Cash
12 Cash
13 Cash
14 Cash
15 Cash
16 Cash
17 Cash
18 Cash
19 Cash
20 Cash
21 Cash
22 Credit Card
23 Cash
24 Cash
25 Cash
26 Credit Card
27 Cash
28 Cash
29 Cash
In [42]:
type(Tz['payment_mode'].iloc[0])
Out[42]:
str
In [43]:
#CHART SHOWING MOST USED PAYMENT MODE BY TOURIST

plt.style.use('ggplot')
sns.catplot(data=Tz, x='payment_mode', kind='count').set(title='PAYMENT MODE USED')
Out[43]:
<seaborn.axisgrid.FacetGrid at 0x2431a874790>

The visualization above indicate most tourist prefered payment via cash.

In [44]:
#TOURISM MAIN ACTIVITIES IN TANZANIA
In [45]:
Tz[['main_activity']].head(10)
Out[45]:
main_activity
0 Wildlife tourism
1 Cultural tourism
2 Cultural tourism
3 Wildlife tourism
4 Wildlife tourism
5 Wildlife tourism
6 Mountain climbing
7 Wildlife tourism
8 Cultural tourism
9 Wildlife tourism
In [46]:
sns.catplot(data=Tz, y='main_activity', kind='count')
Out[46]:
<seaborn.axisgrid.FacetGrid at 0x2431aa3d760>

The tourist are more engaged in wildlife and beach tourism activities every time they visit with Wildlife more participated.

In [47]:
Tz[['package_food']]
Out[47]:
package_food
0 No
1 No
2 No
3 Yes
4 No
... ...
4804 No
4805 Yes
4806 No
4807 Yes
4808 Yes

4809 rows × 1 columns

In [48]:
sns.catplot(data=Tz, x='package_food', kind='count')
Out[48]:
<seaborn.axisgrid.FacetGrid at 0x243195d83d0>

Observation above, indicates most tourist prefer No package food during there tourist activities, this could be it's more delicious and safer or flexible in eating.

DATA PROCESSING¶

In [49]:
#FEATURE ENGINEERING
#Getting arrays with features to train on
In [50]:
Tz.columns
Out[50]:
Index(['ID', 'country', 'age_group', 'travel_with', 'total_female',
       'total_male', 'purpose', 'main_activity', 'info_source',
       'tour_arrangement', 'package_transport_int', 'package_accomodation',
       'package_food', 'package_transport_tz', 'package_sightseeing',
       'package_guided_tour', 'package_insurance', 'night_mainland',
       'night_zanzibar', 'payment_mode', 'first_trip_tz', 'most_impressing',
       'total_cost'],
      dtype='object')
In [51]:
#PREPARING DATA FOR MODELLING
In [52]:
#features for modelling
X = Tz[['total_female',
       'total_male','night_mainland','night_zanzibar'
       ]]
In [53]:
#Target variable or Prediction variable.

Y = Tz[['total_cost']]
In [54]:
#Train test split

from sklearn.model_selection import train_test_split
In [55]:
X_test,X_train, Y_test,Y_train = train_test_split(X,Y, test_size=0.9, random_state=60)

BUILDING A REGRESSION MODEL¶

In [56]:
#Creating and Training the model

from sklearn.linear_model import LinearRegression
In [57]:
#Instantiate model

lm = LinearRegression()
In [58]:
lm.fit(X_train,Y_train)
Out[58]:
LinearRegression()
In [59]:
print(lm.intercept_)
[4462382.63431905]
In [60]:
lm.coef_
Out[60]:
array([[2071453.64220804,  748866.97405083,   35552.52558308,
         293057.24098337]])
In [61]:
X_train.columns
Out[61]:
Index(['total_female', 'total_male', 'night_mainland', 'night_zanzibar'], dtype='object')

EVALUATING OUR MODEL PERFORMANCE IN PREDICTING SPENDING BEHAVIOUR¶

PREDICTIONS¶

In [62]:
prediction = lm.predict(X_test)
In [63]:
#Predicted spending cost of tourist/ indicating their spending behaviours
prediction
Out[63]:
array([[ 5317907.18511913],
       [ 5602327.3897838 ],
       [ 9334103.93746147],
       [ 5708984.96653305],
       [ 5389012.2362853 ],
       [ 5957852.64561464],
       [ 5246802.13395296],
       [ 6676046.37885942],
       [ 5424564.76186838],
       [ 5246802.13395296],
       [ 6102326.68475304],
       [ 9582971.61654306],
       [12636660.34162491],
       [ 5282354.65953605],
       [ 7389360.82732717],
       [10458549.12266762],
       [ 5460117.28745147],
       [ 6410394.06156773],
       [ 8778173.26187351],
       [ 5353459.71070221],
       [ 8458200.53162575],
       [ 5424564.76186838],
       [ 9582482.18211578],
       [ 5246802.13395296],
       [ 5531222.33861763],
       [10799390.1423783 ],
       [ 7496018.40407642],
       [ 9334103.93746147],
       [ 5682069.47726866],
       [ 5744537.49211614],
       [ 8988220.09825764],
       [ 6137879.21033613],
       [ 7780438.60874109],
       [ 6960466.58352409],
       [ 9103514.71132558],
       [10600759.22499996],
       [ 5246802.13395296],
       [10316339.02033529],
       [13175806.37900277],
       [ 7531570.9296595 ],
       [ 5566774.86420072],
       [11130778.79163184],
       [ 9334103.93746147],
       [ 6552604.16390007],
       [ 6604941.32769325],
       [ 5353459.71070221],
       [ 7389360.82732717],
       [ 6640493.85327634],
       [21589473.88347934],
       [13477011.22187755],
       [ 5389012.2362853 ],
       [ 8410976.91084744],
       [ 5389012.2362853 ],
       [ 6604941.32769325],
       [ 9041046.69647811],
       [ 5317907.18511913],
       [ 5460117.28745147],
       [10924326.17207325],
       [ 9716544.6825567 ],
       [ 5282354.65953605],
       [ 5282354.65953605],
       [ 6457851.94058388],
       [ 7067124.16027335],
       [ 7531570.9296595 ],
       [ 6747151.43002559],
       [11954345.03367437],
       [ 8703799.89359296],
       [ 7460465.87849334],
       [11385504.62434503],
       [ 7496018.40407642],
       [12596064.99654867],
       [15850674.08726867],
       [ 7709333.55757492],
       [ 8529305.58279192],
       [ 7031571.63469026],
       [10133533.57292671],
       [ 6953323.36194099],
       [14173051.59770791],
       [ 8846009.9959253 ],
       [ 6277825.3758624 ],
       [ 9760734.24445848],
       [ 7673781.03199184],
       [10396081.10782015],
       [ 5282354.65953605],
       [ 6676046.37885942],
       [ 7531570.9296595 ],
       [ 5246802.13395296],
       [ 5424564.76186838],
       [ 5246802.13395296],
       [ 9041046.69647811],
       [ 5460117.28745147],
       [20337828.73177218],
       [ 7884480.83751937],
       [ 9260734.94948924],
       [ 5460117.28745147],
       [11888282.80200136],
       [ 9674129.62303371],
       [ 5317907.18511913],
       [ 5566774.86420072],
       [ 9531919.52070137],
       [ 5282354.65953605],
       [ 6031221.63358688],
       [ 6711598.90444251],
       [10799390.1423783 ],
       [10890058.71444166],
       [11903292.93783269],
       [11982545.59089026],
       [ 6676046.37885942],
       [ 5460117.28745147],
       [ 6889361.53235793],
       [ 5282354.65953605],
       [ 5282354.65953605],
       [ 7166638.5154395 ],
       [ 7546581.06549082],
       [ 7531570.9296595 ],
       [ 5708984.96653305],
       [ 6711598.90444251],
       [ 7638228.50640875],
       [ 6889361.53235793],
       [ 7602675.98082567],
       [ 5708984.96653305],
       [ 7638228.50640875],
       [ 9911091.94868223],
       [17719081.45029487],
       [ 7353808.30174408],
       [ 6640493.85327634],
       [ 9103025.2768983 ],
       [ 7351544.36493802],
       [ 6640493.85327634],
       [ 6889361.53235793],
       [ 9929018.99046302],
       [ 9103025.2768983 ],
       [ 8668247.36800988],
       [10411091.24365147],
       [ 6604941.32769325],
       [11991672.06163624],
       [12048771.3572794 ],
       [ 7031571.63469026],
       [ 7839638.30647419],
       [ 7673781.03199184],
       [ 5317907.18511913],
       [ 5744537.49211614],
       [ 9654076.66770923],
       [ 5282354.65953605],
       [ 9496856.42954557],
       [ 6339289.01040156],
       [ 5282354.65953605],
       [ 8061590.49629139],
       [ 6171167.79911315],
       [ 5424564.76186838],
       [ 4960117.99248222],
       [ 6031221.63358688],
       [ 5317907.18511913],
       [13915546.88230762],
       [ 6815641.13322079],
       [ 8363284.77359345],
       [ 8996857.13457633],
       [ 8747989.45549474],
       [ 6161526.38248614],
       [ 7496018.40407642],
       [ 7673781.03199184],
       [12110234.99181856],
       [ 5460117.28745147],
       [ 7709333.55757492],
       [ 7067124.16027335],
       [12098493.17310162],
       [ 8141332.58377625],
       [ 7493754.46727035],
       [ 5353459.71070221],
       [ 5389012.2362853 ],
       [11567820.63732633],
       [ 5424564.76186838],
       [ 5922300.12003156],
       [ 7262650.29525344],
       [ 7325118.31010091],
       [ 6117336.82058436],
       [ 8454932.21451138],
       [ 6315641.83825155],
       [ 5317907.18511913],
       [11231063.26886847],
       [ 7875190.83205727],
       [ 7318255.776161  ],
       [ 9041046.69647811],
       [ 6747151.43002559],
       [13021854.45795349],
       [ 8996857.13457633],
       [ 9103025.2768983 ],
       [ 5317907.18511913],
       [ 7638228.50640875],
       [ 7353808.30174408],
       [ 5708984.96653305],
       [ 8668247.36800988],
       [ 5353459.71070221],
       [ 6811113.25960866],
       [ 5353459.71070221],
       [ 8961304.60899325],
       [13095574.85709063],
       [ 5353459.71070221],
       [ 5957852.64561464],
       [ 5424564.76186838],
       [18117653.8644568 ],
       [ 6640493.85327634],
       [ 5424564.76186838],
       [ 6676046.37885942],
       [ 5815642.54328231],
       [ 7262650.29525344],
       [ 6640493.85327634],
       [ 8280437.90371033],
       [11231063.26886847],
       [ 5282354.65953605],
       [ 9334103.93746147],
       [ 9135962.45451041],
       [ 7460465.87849334],
       [ 6640493.85327634],
       [ 5531222.33861763],
       [ 7531570.9296595 ],
       [ 6102326.68475304],
       [ 5246802.13395296],
       [ 6782703.95560867],
       [ 9733563.57900464],
       [ 5424564.76186838],
       [ 6330651.97408287],
       [ 5353459.71070221],
       [ 7031571.63469026],
       [ 5637879.91536689],
       [11226020.44937531],
       [ 7582133.59107391],
       [10625130.08981509],
       [ 6889361.53235793],
       [ 6782703.95560867],
       [13291615.93795174],
       [ 7274555.6486865 ],
       [20597622.89543227],
       [ 6359831.40015332],
       [ 7600412.04401961],
       [ 5531222.33861763],
       [ 6517051.63831698],
       [ 7244886.78818876],
       [ 9031920.22573213],
       [ 9254361.84997661],
       [ 5389012.2362853 ],
       [ 5282354.65953605],
       [ 5246802.13395296],
       [ 5317907.18511913],
       [ 5637879.91536689],
       [ 7280439.31377185],
       [ 6960466.58352409],
       [12442113.07549939],
       [ 5353459.71070221],
       [ 7280439.31377185],
       [ 5424564.76186838],
       [ 6640493.85327634],
       [ 7638228.50640875],
       [ 6711598.90444251],
       [ 6223994.39733362],
       [ 7324628.87567363],
       [10082970.9115123 ],
       [11678561.8653284 ],
       [ 5646516.95168558],
       [11781625.22525211],
       [ 5353459.71070221],
       [ 8176885.10935933],
       [ 9361019.42672586],
       [ 6028957.69678081],
       [ 5424564.76186838],
       [ 7424913.35291025],
       [ 7999122.48144391],
       [ 6552604.16390007],
       [ 9022513.06734296],
       [ 5246802.13395296],
       [12101597.95549987],
       [ 5353459.71070221],
       [ 9103025.2768983 ],
       [10538780.64457977],
       [ 7575760.49156128],
       [ 9582971.61654306],
       [ 6351194.36383463],
       [ 9547419.09095998],
       [ 5993405.17119773],
       [ 5317907.18511913],
       [ 6398488.70813466],
       [15048955.00354363],
       [ 8351542.9548765 ],
       [ 8659610.33169119],
       [ 7424913.35291025],
       [ 6315641.83825155],
       [ 9041046.69647811],
       [ 7389360.82732717],
       [ 5246802.13395296],
       [ 5353459.71070221],
       [ 5317907.18511913],
       [ 5389012.2362853 ],
       [15317875.63794969],
       [10280786.4947522 ],
       [ 7111803.15660241],
       [ 5602327.3897838 ],
       [ 7600412.04401961],
       [12953854.18918558],
       [ 5424564.76186838],
       [ 6604941.32769325],
       [ 9334103.93746147],
       [ 7496018.40407642],
       [ 8747989.45549474],
       [ 5282354.65953605],
       [ 9183256.79881044],
       [10568311.48181513],
       [10861368.72279849],
       [ 7496018.40407642],
       [ 7111803.15660241],
       [ 8561589.79126063],
       [14462351.08714961],
       [ 7839638.30647419],
       [ 9334103.93746147],
       [ 7699926.39918576],
       [ 7496018.40407642],
       [ 5211249.60836988],
       [11421057.14992812],
       [ 5460117.28745147],
       [10494591.08267799],
       [ 5353459.71070221],
       [ 8996857.13457633],
       [ 8339637.60144343],
       [ 5708984.96653305],
       [ 6410394.06156773],
       [ 6493404.46616697],
       [ 7744886.08315801],
       [ 5531222.33861763],
       [ 7531570.9296595 ],
       [ 7460465.87849334],
       [ 6889361.53235793],
       [ 5317907.18511913],
       [ 8161874.97352801],
       [ 7887096.18549034],
       [ 7531570.9296595 ],
       [ 6960466.58352409],
       [ 9289914.3755597 ],
       [ 7709333.55757492],
       [ 6173431.73591921],
       [ 7280439.31377185],
       [ 7460465.87849334],
       [ 5753174.52843483],
       [ 6925403.49236829],
       [ 9334103.93746147],
       [ 9538782.05464128],
       [14133271.58677011],
       [12387792.66254332],
       [ 7661875.67855877],
       [ 9103514.71132558],
       [ 5282354.65953605],
       [12154424.55372034],
       [ 5317907.18511913],
       [ 6747151.43002559],
       [ 5531222.33861763],
       [ 5282354.65953605],
       [ 8454932.21451138],
       [ 8363284.77359345],
       [ 9887444.77653221],
       [ 5317907.18511913],
       [ 7709333.55757492],
       [ 6676046.37885942],
       [ 9662224.26960064],
       [ 7602675.98082567],
       [21706894.41009097],
       [ 6386746.88941772],
       [ 7460465.87849334],
       [ 7344401.14335492],
       [ 8834104.64249223],
       [ 8161874.97352801],
       [ 8541047.40150887],
       [ 5780090.01769922],
       [ 8668247.36800988],
       [ 5566774.86420072],
       [ 9281277.339241  ],
       [ 5995669.10800379],
       [ 5353459.71070221],
       [ 9334103.93746147],
       [ 6031221.63358688],
       [ 6066774.15916996],
       [ 8810457.47034222],
       [ 8925752.08341016],
       [11678561.8653284 ],
       [ 7999122.48144391],
       [ 7531570.9296595 ],
       [ 8339637.60144343],
       [ 5708984.96653305],
       [ 6173431.73591921],
       [ 6604941.32769325],
       [ 8349279.01807044],
       [ 8461305.31402401],
       [ 5637879.91536689],
       [ 7839638.30647419],
       [ 5531222.33861763],
       [ 5246802.13395296],
       [ 5282354.65953605],
       [ 6640493.85327634],
       [ 6676046.37885942],
       [11195510.74328538],
       [ 6640493.85327634],
       [ 6996019.10910718],
       [ 6676046.37885942],
       [ 5424564.76186838],
       [ 6676046.37885942],
       [ 9582971.61654306],
       [ 5637879.91536689],
       [11654425.2587511 ],
       [ 6604941.32769325],
       [ 7067613.59470063],
       [ 5246802.13395296],
       [ 7804085.78089111],
       [ 7067124.16027335],
       [ 5895384.63076717],
       [ 9831839.29562465],
       [13241053.27653734],
       [ 8552952.75494194],
       [ 6604941.32769325],
       [ 6711598.90444251],
       [ 8454932.21451138],
       [ 5939574.19266894],
       [ 6853809.00677484],
       [10194742.03127643],
       [ 6782703.95560867],
       [12952105.19826053],
       [ 9745724.10862716],
       [ 9567472.04628446],
       [ 5246802.13395296],
       [ 5637879.91536689],
       [26903467.72825819],
       [ 6552604.16390007],
       [ 6676046.37885942],
       [ 6031221.63358688],
       [ 5939574.19266894],
       [ 8176885.10935933],
       [ 6569388.80211017],
       [ 5460117.28745147],
       [ 5424564.76186838],
       [ 9582971.61654306],
       [ 5246802.13395296],
       [ 9334103.93746147],
       [ 8588505.28052502],
       [ 5317907.18511913],
       [ 5708984.96653305],
       [ 5575411.90051941],
       [ 6782703.95560867],
       [ 8286811.00322296],
       [ 7780438.60874109],
       [ 6711598.90444251],
       [ 8668247.36800988],
       [11275252.83077024],
       [ 6640493.85327634],
       [11848972.52487662],
       [ 7638228.50640875],
       [ 6676046.37885942],
       [ 6102326.68475304],
       [ 7280928.74819913],
       [ 9254361.84997661],
       [ 8996367.70014905],
       [ 5317907.18511913],
       [ 7496018.40407642],
       [ 5531222.33861763],
       [10390548.85389971],
       [19268010.15861903],
       [ 5708984.96653305],
       [ 7424913.35291025],
       [ 5389012.2362853 ],
       [15604070.34499315],
       [ 8925262.64898288],
       [ 7318255.776161  ],
       [ 5424564.76186838],
       [ 5317907.18511913],
       [ 8260384.94838585],
       [ 8854157.59781671],
       [ 8747989.45549474],
       [ 8925752.08341016],
       [ 5173433.14598073],
       [ 8552952.75494194],
       [ 9334103.93746147],
       [ 7353808.30174408],
       [ 7531570.9296595 ],
       [ 5282354.65953605],
       [ 6640493.85327634]])
In [64]:
#Y_test containing the correct spending cost/habit of tourist
Y_test
Out[64]:
total_cost
4284 130000.0
3401 8453250.0
3204 7293000.0
3362 497250.0
3822 6132750.0
... ...
2147 6298500.0
1418 3480750.0
3654 8619000.0
3137 600000.0
2253 497250.0

480 rows × 1 columns

In [65]:
#Comparing Y_test to the prediction to find the residuals
plt.scatter(Y_test, prediction/10**6)
Out[65]:
<matplotlib.collections.PathCollection at 0x2431ae2a1f0>
In [66]:
#Plotting histogram distribution of the residuals
sns.distplot(Y_test- prediction/10**6)
Out[66]:
<AxesSubplot:ylabel='Density'>

EVALUATING THE MODEL With Loss functions so ass to minimise the errors.

REGRESSION EVALUATION METRICS:

In [67]:
#Looking at the Linear evaluation metrics we get:

from sklearn import metrics
In [68]:
#Average error of the model
metrics.mean_absolute_error(Y_test,prediction)
Out[68]:
7232385.834148328
In [69]:
#Check-mating larger errors by squaring.

metrics.mean_squared_error(Y_test,prediction)
Out[69]:
121440898468113.73
In [70]:
#RMSE Interpreting the Y units prediction.

np.sqrt(metrics.mean_squared_error(Y_test,prediction))
Out[70]:
11020022.616497379
In [71]:
r_squared = lm.score(X, Y)
In [72]:
print(r_squared)
0.09682259179087571
In [73]:
r_squared = lm.score(X_test,prediction)
In [74]:
print(r_squared)
1.0
In [75]:
from sklearn.metrics import r2_score
In [76]:
r2_score(Y_test,prediction)
Out[76]:
0.1787447445823661
In [77]:
r2_score(Y_test,prediction).dtype
Out[77]:
dtype('float64')

Our model reads r2_score of 18% approx performance in prediction.

TESTING LINEAR MODEL ASSUMPTION¶

Using/importing another dataset to test our model performance,

In [78]:
from sklearn.svm import SVC
In [79]:
tz = pd.read_csv('Test .csv')
tz
Out[79]:
ID country age_group travel_with total_female total_male purpose main_activity info_source tour_arrangement ... package_food package_transport_tz package_sightseeing package_guided_tour package_insurance night_mainland night_zanzibar payment_mode first_trip_tz most_impressing
0 tour_1 AUSTRALIA 45-64 Spouse 1.0 1.0 Leisure and Holidays Wildlife tourism Travel, agent, tour operator Package Tour ... Yes Yes Yes Yes Yes 10 3 Cash Yes Wildlife
1 tour_100 SOUTH AFRICA 25-44 Friends/Relatives 0.0 4.0 Business Wildlife tourism Tanzania Mission Abroad Package Tour ... No No No No No 13 0 Cash No Wonderful Country, Landscape, Nature
2 tour_1001 GERMANY 25-44 Friends/Relatives 3.0 0.0 Leisure and Holidays Beach tourism Friends, relatives Independent ... No No No No No 7 14 Cash No No comments
3 tour_1006 CANADA 24-Jan Friends/Relatives 2.0 0.0 Leisure and Holidays Cultural tourism others Independent ... No No No No No 0 4 Cash Yes Friendly People
4 tour_1009 UNITED KINGDOM 45-64 Friends/Relatives 2.0 2.0 Leisure and Holidays Wildlife tourism Friends, relatives Package Tour ... Yes Yes No No No 10 0 Cash Yes Friendly People
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1596 tour_988 UNITED STATES OF AMERICA 25-44 NaN 0.0 1.0 Meetings and Conference Mountain climbing Newspaper, magazines,brochures Independent ... No No No No No 1 0 Cash No NaN
1597 tour_990 ITALY 45-64 Spouse and Children 3.0 1.0 Leisure and Holidays Wildlife tourism Friends, relatives Package Tour ... Yes Yes Yes No No 10 5 Other Yes Wildlife
1598 tour_992 FINLAND 25-44 Alone 0.0 1.0 Meetings and Conference Mountain climbing Friends, relatives Independent ... No No No No No 6 0 Cash Yes No comments
1599 tour_996 SOUTH AFRICA 24-Jan Alone 0.0 1.0 Business Beach tourism Friends, relatives Independent ... No No No No No 4 0 Cash Yes Wildlife
1600 tour_998 SOUTH AFRICA 25-44 Spouse 1.0 1.0 Leisure and Holidays Cultural tourism Radio, TV, Web Independent ... No No No No No 9 5 Cash Yes Friendly People

1601 rows × 22 columns

In [80]:
tz.corr()
Out[80]:
total_female total_male night_mainland night_zanzibar
total_female 1.000000 0.288933 0.015265 0.078020
total_male 0.288933 1.000000 -0.035880 0.020622
night_mainland 0.015265 -0.035880 1.000000 0.516262
night_zanzibar 0.078020 0.020622 0.516262 1.000000
In [81]:
sns.heatmap(tz.corr(), annot=True)
Out[81]:
<AxesSubplot:>
In [82]:
lm.fit(X,Y)
Out[82]:
LinearRegression()
In [83]:
Y_pred = lm.predict(X_test)
In [84]:
Y_pred
Out[84]:
array([[ 5082921.44689458],
       [ 5356361.05250342],
       [ 9505993.91042197],
       [ 5458900.90460673],
       [ 5151281.34829679],
       [ 5698160.55951447],
       [ 5014561.54549237],
       [ 6635229.06918205],
       [ 5185461.29899789],
       [ 5014561.54549237],
       [ 5851962.15137201],
       [ 9745253.56532971],
       [12918546.98377004],
       [ 5048741.49619347],
       [ 7335909.87225727],
       [10563018.70630403],
       [ 5219641.249699  ],
       [ 6261982.81835893],
       [ 8686009.73865349],
       [ 5117101.39759568],
       [ 8378390.18234354],
       [ 5185461.29899789],
       [ 9776798.26095763],
       [ 5014561.54549237],
       [ 5288001.15110121],
       [11129296.68918484],
       [ 7438449.72436059],
       [ 9505993.91042197],
       [ 5475941.90404936],
       [ 5493080.85530784],
       [ 9044711.50368083],
       [ 5886142.10207311],
       [ 7711889.32996943],
       [ 6908668.67479089],
       [ 9198472.30592788],
       [10699738.50910845],
       [ 5014561.54549237],
       [10426298.90349961],
       [13604544.55962314],
       [ 7472629.67506169],
       [ 5322181.10180231],
       [11366059.83044573],
       [ 9505993.91042197],
       [ 6398702.62116335],
       [ 6566869.16777984],
       [ 5117101.39759568],
       [ 7335909.87225727],
       [ 6601049.11848095],
       [22294929.97428402],
       [14011970.76114736],
       [ 5151281.34829679],
       [ 8056577.15789072],
       [ 5151281.34829679],
       [ 6566869.16777984],
       [ 9181333.3546694 ],
       [ 5082921.44689458],
       [ 5219641.249699  ],
       [11163574.5917018 ],
       [ 9830752.4179904 ],
       [ 5048741.49619347],
       [ 5048741.49619347],
       [ 6193761.65838306],
       [ 7011208.52689421],
       [ 7472629.67506169],
       [ 6703588.97058426],
       [12325497.01190767],
       [ 8771271.89807199],
       [ 7404269.77365948],
       [11778617.80068999],
       [ 7438449.72436059],
       [12989362.6092086 ],
       [16501681.53816046],
       [ 7643529.42856722],
       [ 8446750.08374575],
       [ 6977028.5761931 ],
       [10360394.72613375],
       [ 6655199.17914541],
       [14610209.66393512],
       [ 8907991.70087641],
       [ 6005780.11582442],
       [ 9916153.31883524],
       [ 7609349.47786611],
       [10545879.75504555],
       [ 5048741.49619347],
       [ 6635229.06918205],
       [ 7472629.67506169],
       [ 5014561.54549237],
       [ 5185461.29899789],
       [ 5014561.54549237],
       [ 9181333.3546694 ],
       [ 5219641.249699  ],
       [21186545.4868319 ],
       [ 7595123.2645335 ],
       [ 9420552.21996665],
       [ 5219641.249699  ],
       [12152141.53436579],
       [ 9793978.0018266 ],
       [ 5082921.44689458],
       [ 5322181.10180231],
       [ 9657258.19902218],
       [ 5048741.49619347],
       [ 5783602.2499698 ],
       [ 6669409.01988316],
       [11129296.68918484],
       [11209565.82130965],
       [12237501.64560014],
       [12388627.19277401],
       [ 6635229.06918205],
       [ 5219641.249699  ],
       [ 6840308.77338868],
       [ 5048741.49619347],
       [ 5048741.49619347],
       [ 6860278.88335204],
       [ 7557989.78629604],
       [ 7472629.67506169],
       [ 5458900.90460673],
       [ 6669409.01988316],
       [ 7575169.52716501],
       [ 6840308.77338868],
       [ 7540989.5764639 ],
       [ 5458900.90460673],
       [ 7575169.52716501],
       [10135638.76741131],
       [18164793.83984446],
       [ 7301729.92155617],
       [ 6601049.11848095],
       [ 9230017.0015558 ],
       [ 7284648.13250305],
       [ 6601049.11848095],
       [ 6840308.77338868],
       [ 9865152.68933882],
       [ 9230017.0015558 ],
       [ 8737091.94737088],
       [10631239.8662799 ],
       [ 6566869.16777984],
       [12408303.44728982],
       [12220599.38758385],
       [ 6977028.5761931 ],
       [ 7882650.34204862],
       [ 7609349.47786611],
       [ 5082921.44689458],
       [ 5493080.85530784],
       [ 9813613.46673192],
       [ 5048741.49619347],
       [ 9591533.55269315],
       [ 6193622.91695672],
       [ 5048741.49619347],
       [ 8138950.99639898],
       [ 5903240.2637211 ],
       [ 5185461.29899789],
       [ 4724040.15083041],
       [ 5783602.2499698 ],
       [ 5082921.44689458],
       [14319729.05888365],
       [ 6552642.95444723],
       [ 8514832.50229529],
       [ 9095932.45382456],
       [ 8856672.79891683],
       [ 6022723.16345119],
       [ 7438449.72436059],
       [ 7609349.47786611],
       [12408442.18871615],
       [ 5219641.249699  ],
       [ 7643529.42856722],
       [ 7011208.52689421],
       [12169460.0166611 ],
       [ 8258531.84794492],
       [ 7421367.93530747],
       [ 5117101.39759568],
       [ 5151281.34829679],
       [11876066.67368377],
       [ 5185461.29899789],
       [ 5663980.60881336],
       [ 7253005.48505928],
       [ 7270144.43631776],
       [ 5937322.26260636],
       [ 8532012.24316425],
       [ 6057041.85557864],
       [ 5082921.44689458],
       [11434460.52145843],
       [ 7916830.29274972],
       [ 7267549.97085506],
       [ 9181333.3546694 ],
       [ 6703588.97058426],
       [13228842.58476366],
       [ 9095932.45382456],
       [ 9230017.0015558 ],
       [ 5082921.44689458],
       [ 7575169.52716501],
       [ 7301729.92155617],
       [ 5458900.90460673],
       [ 8737091.94737088],
       [ 5117101.39759568],
       [ 6518479.37634099],
       [ 5117101.39759568],
       [ 9061752.50312346],
       [13516508.40370511],
       [ 5117101.39759568],
       [ 5698160.55951447],
       [ 5185461.29899789],
       [18996466.92168823],
       [ 6601049.11848095],
       [ 5185461.29899789],
       [ 6635229.06918205],
       [ 5561440.75671005],
       [ 7253005.48505928],
       [ 6601049.11848095],
       [ 8207490.42883802],
       [11434460.52145843],
       [ 5048741.49619347],
       [ 9505993.91042197],
       [ 9044891.03471766],
       [ 7404269.77365948],
       [ 6601049.11848095],
       [ 5288001.15110121],
       [ 7472629.67506169],
       [ 5851962.15137201],
       [ 5014561.54549237],
       [ 6737768.92128537],
       [ 9574704.82947709],
       [ 5185461.29899789],
       [ 6142401.96681299],
       [ 5117101.39759568],
       [ 6977028.5761931 ],
       [ 5390541.00320452],
       [11539456.09759811],
       [ 7592169.73699715],
       [10414740.69042992],
       [ 6840308.77338868],
       [ 6737768.92128537],
       [13619146.20762428],
       [ 7150604.3743823 ],
       [21386493.4223185 ],
       [ 6142442.75642348],
       [ 7523907.78741079],
       [ 5288001.15110121],
       [ 6364522.67046225],
       [ 7182108.28039974],
       [ 9161657.10015359],
       [ 9386413.05887603],
       [ 5151281.34829679],
       [ 5048741.49619347],
       [ 5014561.54549237],
       [ 5082921.44689458],
       [ 5390541.00320452],
       [ 7216288.23110084],
       [ 6908668.67479089],
       [12613660.63434912],
       [ 5117101.39759568],
       [ 7216288.23110084],
       [ 5185461.29899789],
       [ 6601049.11848095],
       [ 7575169.52716501],
       [ 6669409.01988316],
       [ 6039862.11470967],
       [ 7301689.13194568],
       [10240854.6641983 ],
       [12103278.35644257],
       [ 5441761.95334826],
       [12049601.68226248],
       [ 5117101.39759568],
       [ 8292711.79864603],
       [ 9488952.91097935],
       [ 5766520.46091668],
       [ 5185461.29899789],
       [ 7370089.82295838],
       [ 8121812.0451405 ],
       [ 6398702.62116335],
       [ 8891105.81545499],
       [ 5014561.54549237],
       [12357221.23857242],
       [ 5117101.39759568],
       [ 9230017.0015558 ],
       [10651054.86222205],
       [ 7558030.57590653],
       [ 9745253.56532971],
       [ 6091221.80627974],
       [ 9711073.61462861],
       [ 5732340.51021557],
       [ 5082921.44689458],
       [ 6364383.92903591],
       [15737813.39201355],
       [ 8275850.33024023],
       [ 8685870.99722715],
       [ 7370089.82295838],
       [ 6057041.85557864],
       [ 9181333.3546694 ],
       [ 7335909.87225727],
       [ 5014561.54549237],
       [ 5117101.39759568],
       [ 5082921.44689458],
       [ 5151281.34829679],
       [15957437.58201596],
       [10392118.9527985 ],
       [ 7065064.73211113],
       [ 5356361.05250342],
       [ 7523907.78741079],
       [13348243.90527277],
       [ 5185461.29899789],
       [ 6566869.16777984],
       [ 9505993.91042197],
       [ 7438449.72436059],
       [ 8856672.79891683],
       [ 5048741.49619347],
       [ 9318053.15747382],
       [10853319.78031867],
       [11177980.33607124],
       [ 7438449.72436059],
       [ 7065064.73211113],
       [ 8634552.09526757],
       [15120036.97613632],
       [ 7882650.34204862],
       [ 9505993.91042197],
       [ 7372978.14386862],
       [ 7438449.72436059],
       [ 4980381.59479126],
       [11812797.7513911 ],
       [ 5219641.249699  ],
       [10565653.96137721],
       [ 5117101.39759568],
       [ 9095932.45382456],
       [ 8378251.44091721],
       [ 5458900.90460673],
       [ 6261982.81835893],
       [ 6227941.60908416],
       [ 7677709.37926832],
       [ 5288001.15110121],
       [ 7472629.67506169],
       [ 7404269.77365948],
       [ 6840308.77338868],
       [ 5082921.44689458],
       [ 8207351.68741168],
       [ 7814429.18207274],
       [ 7472629.67506169],
       [ 6908668.67479089],
       [ 9420593.00957714],
       [ 7643529.42856722],
       [ 5920322.05277422],
       [ 7216288.23110084],
       [ 7404269.77365948],
       [ 5544301.80545157],
       [ 6842944.02846187],
       [ 9505993.91042197],
       [ 9659852.66448487],
       [14959319.18084988],
       [12679287.3288623 ],
       [ 7711750.58854309],
       [ 9198472.30592788],
       [ 5048741.49619347],
       [12493843.08956099],
       [ 5082921.44689458],
       [ 6703588.97058426],
       [ 5288001.15110121],
       [ 5048741.49619347],
       [ 8532012.24316425],
       [ 8514832.50229529],
       [ 9999057.70603323],
       [ 5082921.44689458],
       [ 7643529.42856722],
       [ 6635229.06918205],
       [ 9896379.11250358],
       [ 7540989.5764639 ],
       [22699541.38969824],
       [ 6125401.75698085],
       [ 7404269.77365948],
       [ 7031178.63685757],
       [ 9010392.81155339],
       [ 8207351.68741168],
       [ 8685732.25580081],
       [ 5527260.80600894],
       [ 8737091.94737088],
       [ 5322181.10180231],
       [ 9369372.0594334 ],
       [ 5749422.29926869],
       [ 5117101.39759568],
       [ 9505993.91042197],
       [ 5783602.2499698 ],
       [ 5817782.2006709 ],
       [ 8873811.7501753 ],
       [ 9027572.55242235],
       [12103278.35644257],
       [ 8121812.0451405 ],
       [ 7472629.67506169],
       [ 8378251.44091721],
       [ 5458900.90460673],
       [ 5920322.05277422],
       [ 6566869.16777984],
       [ 8258768.54118711],
       [ 8566151.40425487],
       [ 5390541.00320452],
       [ 7882650.34204862],
       [ 5288001.15110121],
       [ 5014561.54549237],
       [ 5048741.49619347],
       [ 6601049.11848095],
       [ 6635229.06918205],
       [11400280.57075733],
       [ 6601049.11848095],
       [ 6942848.625492  ],
       [ 6635229.06918205],
       [ 5185461.29899789],
       [ 6635229.06918205],
       [ 9745253.56532971],
       [ 5390541.00320452],
       [11998241.99069241],
       [ 6566869.16777984],
       [ 6979663.83126629],
       [ 5014561.54549237],
       [ 7848470.39134751],
       [ 7011208.52689421],
       [ 5681021.60825599],
       [ 9984513.22023745],
       [13499606.14568882],
       [ 8583331.14512384],
       [ 6566869.16777984],
       [ 6669409.01988316],
       [ 8532012.24316425],
       [ 5766422.50910083],
       [ 6806128.82268758],
       [10189748.03846529],
       [ 6737768.92128537],
       [13192002.96197375],
       [ 9830793.20760089],
       [ 9691438.14972328],
       [ 5014561.54549237],
       [ 5390541.00320452],
       [28504021.7220076 ],
       [ 6398702.62116335],
       [ 6635229.06918205],
       [ 5783602.2499698 ],
       [ 5766422.50910083],
       [ 8292711.79864603],
       [ 6532689.21707874],
       [ 5219641.249699  ],
       [ 5185461.29899789],
       [ 9745253.56532971],
       [ 5014561.54549237],
       [ 9505993.91042197],
       [ 8617511.09582494],
       [ 5082921.44689458],
       [ 5458900.90460673],
       [ 5373402.05194605],
       [ 6737768.92128537],
       [ 8241629.58992863],
       [ 7711889.32996943],
       [ 6669409.01988316],
       [ 8737091.94737088],
       [11519861.42230327],
       [ 6601049.11848095],
       [12303128.34011332],
       [ 7575169.52716501],
       [ 6635229.06918205],
       [ 5851962.15137201],
       [ 7184743.53547292],
       [ 9386413.05887603],
       [ 9127477.14945249],
       [ 5082921.44689458],
       [ 7438449.72436059],
       [ 5288001.15110121],
       [10682420.02681314],
       [20207154.56800148],
       [ 5458900.90460673],
       [ 7370089.82295838],
       [ 5151281.34829679],
       [16279503.67230584],
       [ 9059117.24805027],
       [ 7267549.97085506],
       [ 5185461.29899789],
       [ 5082921.44689458],
       [ 8227125.89374334],
       [ 8990757.34664806],
       [ 8856672.79891683],
       [ 9027572.55242235],
       [ 4929119.85503704],
       [ 8583331.14512384],
       [ 9505993.91042197],
       [ 7301729.92155617],
       [ 7472629.67506169],
       [ 5048741.49619347],
       [ 6601049.11848095]])
In [85]:
#Evaluate accuracy
from sklearn.metrics import accuracy_score
accuracy = lm.score(X_test,Y_pred)
In [86]:
accuracy
Out[86]:
1.0

Interestingly, after testing our model with another datset(Test .csv) looking at the data correlations, our model was able to predict(Y_pred) the spending rate of the tourist considering the data is missing the total_cost column or feature. Our model achieve accuracy of 1.0(100%) prediction of spending habit of tourist in the tz data i.e Test .csv file.

In [87]:
X_test
Out[87]:
total_female total_male night_mainland night_zanzibar
4284 0.0 1.0 3.0 0.0
3401 0.0 1.0 11.0 0.0
3204 1.0 1.0 0.0 7.0
3362 0.0 1.0 14.0 0.0
3822 0.0 1.0 5.0 0.0
... ... ... ... ...
2147 1.0 1.0 0.0 7.0
1418 1.0 1.0 2.0 0.0
3654 1.0 1.0 7.0 0.0
3137 0.0 1.0 2.0 0.0
2253 1.0 0.0 3.0 0.0

480 rows × 4 columns

In [88]:
Y_pred.sum()
Out[88]:
3773139286.330149

Actionable Insights and Recommendations¶

Besides observation listed in the markdown text, The need for the tourism sector to pay more attention to tourism activities not attracting tourist attention such as

  1. Business tourism which can boost the industry spread.
  2. The Conference tourism, a major publicity strategy. As its upgrade and optimization will further attract interesting and engaging event to the venue.
  3. The wildlife have seen monumental activities in the area with aid of experienced and skillful management. It will be interesting to tap into their potential to encourage their colleagues in other tourism sector.

Secondly, its been observed that most tourist prefer payment via direct cash. This is risky considering the currency exchange rate of Tanzanian shillings to Europe, USA, etc. making tourist to move about with large sum of money. It will be preferable to encourage an electronic payment system which has more value return on investment. Lastly, non packaged food is mostly preferred by tourist due to its flexibility, sweetness and freshness. So a more priority to be channel towards its expansion.

NOTE: The model chosen for this project didn't really fit into the dataset given, producing a not too regular regression pattern which may affect the model durability & performance in future. Using other regression model such as Three distribution, etc will be really helpful in future development.

In [ ]: